Install packages we will need.

Module 1 for William Norfolk

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.3     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# install.packages("ggplot2")
# install.packages("sp")
# install.packages("raster")
# install.packages("rgdal")
# install.packages("brew")
# install.packages("systemfonts")
# install.packages("mapview")
# install.packages("spatstat")
# install.packages("tmaptools")
# install.packages("elevatr")
# install.packages("rgbif")
# install.packages("remotes")
# remotes::install_github("Nowosad/spDataLarge")
# install.packages("spData")
# install.packages("gstat")
# install.packages("fields")
# install.packages("foreign")
# install.packages("velox")

What is a geographic information system (GIS)? While often thought of as means to making maps, the more basic utility of a GIS is the ability to overlay diverse spatial data layers. This is something that flat maps didn’t allow us to do, and is best gotten across with an example. Let’s start with tabular data on the point locations of detention basins (boring data that provides a great example!). In the process of working with this data you will be introduced to several topics (coordinate systems, vector and raster data) that we treat in more depth later.

Let’s first consider coordinates. Web services like Google Maps have locations across much of the world mapped to lat-long coordinates. When these locations are polygons (e.g. Athens-Clarke County), Google Maps has an associated centroid (the approximate geographic center of the polygon) and bounding box. So we can actually retrieve the coordinates of named locations by geocoding, basically an API call. Since Google now charges, we’ll use OpenStreetMap instead :)

API = applications program interface a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service

library(tmaptools)
#call coordinates of the named location, in this case Athens based on the geocode_OSM function from tmaptools

acc<-geocode_OSM("Athens-Clarke County")
acc$coords
##         x         y 
## -83.35578  33.94587
acc$bbox
##      xmin      ymin      xmax      ymax 
## -83.53747  33.84802 -83.24083  34.03947
#testing another generic example
x <- geocode_OSM("Rio de Janerio")
x$coords
##         x         y 
## -75.97818  36.79028
x$bbox
##      xmin      ymin      xmax      ymax 
## -75.97843  36.78985 -75.97779  36.79065

I have a bunch of herbarium data on the occurrences of rare plants associated with granite outcrops in the Southeast. Unfortunately, until recently, botanists collecting these specimens did not report coordinates. Nevertheless, many times location names can be used to generate coordinates. Many names are not unique, but usually are if combined with county and state. Let’s see if we can get correct coordinates for Echols Mill (now quarried) nearby in Oglethorpe county using just the place name.

em1<-geocode_OSM("Echols Mill")
em2<-geocode_OSM("Echols Mill, Oglethorpe, Georgia")

#pretty close but not exactly the same without county and state
em1
## $query
## [1] "Echols Mill"
## 
## $coords
##         x         y 
## -82.90788  30.69680 
## 
## $bbox
##      xmin      ymin      xmax      ymax 
## -83.13712  30.59162 -82.57942  30.87262
em2
## $query
## [1] "Echols Mill, Oglethorpe, Georgia"
## 
## $coords
##         x         y 
## -83.00607  33.97512 
## 
## $bbox
##      xmin      ymin      xmax      ymax 
## -83.00639  33.97235 -83.00295  33.97593

When we don’t supply more detail, we get the centroid of Echols county in south Georgia on the Florida line. But when we do, we get the Echols Mill I had in mind. If you’re geocoding, you will definitely want to doublecheck your coordinates by overlaying them on a map of the region of interest to make sure the API is making the right associations! Below, we’ll see how to overlay points on polygons to do a spatial join to collect the info you would need to check against for accuracy.

To illustrate, let’s bring in a .csv that contains the point locations of detention basins in Sandy Springs, which is near Atlanta in Fulton county (http://www.fultoncountyga.gov/fcgis-geospatial-data).

# Bring in Sandy Springs detention basin points from Fulton County GIS website
#update path to relative folder in directory

dbasins<-read.csv("../IDEAS_Spatial_Ecology_Workshop/Stormwater_Basins__Ponds__Hosted___Sandy_Springs_Georgia.csv")

# Take a look at the data
head(dbasins)
##           X        Y OBJECTID BasinType BasinSurf DateCreated ConstructType
## 1 -84.35676 33.88007     2801  Wet Pond   Surface        1996         Riser
## 2 -84.35658 33.88119     2802  Dry Pond   Surface        2007     Weir Wall
## 3 -84.35635 33.88207     2803  Dry Pond   Surface        1989     Weir Wall
## 4 -84.35419 33.88468     2804  Dry Pond   Surface        2001         Riser
## 5 -84.35284 33.88555     2805  Dry Pond   Surface        1989     Weir Wall
## 6 -84.35674 33.88303     2806  Wet Pond   Surface        1999         Riser
##   DamHeight
## 1         8
## 2         5
## 3         6
## 4        10
## 5         3
## 6         5

Notice that there is an X and Y column giving the longitude and latitude of the location of each pond. But also a series of attributes for each pond, both categorical and continuous.

#add for data viewing
glimpse(dbasins)
## Rows: 1,261
## Columns: 8
## $ X             <dbl> -84.35676, -84.35658, -84.35635, -84.35419, -84.35284, …
## $ Y             <dbl> 33.88007, 33.88119, 33.88207, 33.88468, 33.88555, 33.88…
## $ OBJECTID      <int> 2801, 2802, 2803, 2804, 2805, 2806, 2807, 2808, 2809, 2…
## $ BasinType     <fct> Wet Pond, Dry Pond, Dry Pond, Dry Pond, Dry Pond, Wet P…
## $ BasinSurf     <fct> Surface, Surface, Surface, Surface, Surface, Surface, S…
## $ DateCreated   <int> 1996, 2007, 1989, 2001, 1989, 1999, 1989, 1989, 2007, 2…
## $ ConstructType <fct> Riser, Weir Wall, Weir Wall, Riser, Weir Wall, Riser, R…
## $ DamHeight     <dbl> 8.00, 5.00, 6.00, 10.00, 3.00, 5.00, 10.00, 5.00, 5.50,…
# What are the categories?
# the categorical variables (factors) are:
#ConstructType-presumably the basin walling type
#BasinType-what the basin is
#BasinSurf-presumable the water basin position, either surface water or underground

unique(dbasins$BasinType)
## [1] Wet Pond  Dry Pond  Other BMP
## Levels: Dry Pond Other BMP Wet Pond
unique(dbasins$BasinSurf)
## [1] Surface           Underground Vault Parking Lot       N/A              
## Levels: N/A Parking Lot Surface Underground Vault
unique(dbasins$ConstructType)
## [1] Riser          Weir Wall      Stand Pipe     Concrete Flume Earthen Flume 
## [6] N/A           
## Levels: Concrete Flume Earthen Flume N/A Riser Stand Pipe Weir Wall

We can explore these attributes without considering space. How many ponds by type, whether surface or underground, and construction type?

# Aggregate data by categories
#makes a seperate dataframe with just the three categorical variables
dbtypes<-with(dbasins,table(BasinType,BasinSurf,ConstructType)) 
dbtypes<-as.data.frame(dbtypes)
head(dbtypes)
##   BasinType   BasinSurf  ConstructType Freq
## 1  Dry Pond         N/A Concrete Flume    0
## 2 Other BMP         N/A Concrete Flume    0
## 3  Wet Pond         N/A Concrete Flume    0
## 4  Dry Pond Parking Lot Concrete Flume    0
## 5 Other BMP Parking Lot Concrete Flume    0
## 6  Wet Pond Parking Lot Concrete Flume    0
#The resulting object shows the frequecy of all permutations of each factor level in all of the variables so you can view how many basins fall into BasinType[x], BasinSurf[y] and ConstructType[z] where x, y, and z are each factor level present in the variable. 

We can also graph non-spatial summaries of the attribute data. First by categories.

# Load graphics package
library(ggplot2)

#visualizes summary statistics determined above

# Make barplot of frequencies across categories
ggplot(data=dbtypes, aes(y=Freq,x=BasinType,fill=ConstructType)) + 
  geom_bar( stat="identity") + 
  facet_grid(~BasinSurf) + 
  labs(y = "Count") + 
  labs(x = "") + 
  theme(legend.position="bottom", 
        axis.text.x = element_text(angle = 45, hjust = 1)) #add angles to read x-axis

Now by categorical and continuous fields.

# Get age
dbasins$Age<-2019-dbasins$DateCreated #calculates age by subtracting the DateCreated variable from 2019.

# Show age range by construction type 
ggplot(data=dbasins, aes(y=Age,x=ConstructType,fill=ConstructType)) + 
  geom_violin(alpha = 0.5) + coord_flip() +
  theme(legend.position="") +
  labs(x = "")
## Warning: Removed 30 rows containing non-finite values (stat_ydensity).

To make use of the spatial component of the data in R, we need to convert the table to a spatial points dataframe (SPDF). I discuss data formats in more detail in the zoom recording.

# load sp package
library(sp) #Classes and Methods for Spatial Data Library

# First let's look at arguments for the command SpatialPointsDataFrame
?SpatialPointsDataFrame
#basically a special class of R objects for coordinate data frames

To convert to SPDF, we need the CORRECT descripton of the coordinate system.

# In this case the coordinates are lat-long so the value of the argument is

# CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")-this is the proj4string argument answer. 

#How are these designated? or is this a standard format for Lat-Long coordinates?

dbasin_pts<-SpatialPointsDataFrame(dbasins[,1:2],
              dbasins, 
              coords.nrs = numeric(0), 
              proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

Now let’s take a quick look to make sure the points seem okay. The plot shows basically the shape of north Fulton county which is where the points should be.

#Neat!
plot(dbasin_pts)

Let’s look at the description of the SPDF. Notice that we now have points that can be displayed in 2D space that also have attributes linked to them.

dbasin_pts
## class       : SpatialPointsDataFrame 
## features    : 1261 
## extent      : -84.44169, -84.25951, 33.87834, 34.00883  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 9
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated,  ConstructType, DamHeight, Age 
## min values  : -84.44169472, 33.87833621,     2801,  Dry Pond,               N/A,        1950, Concrete Flume,         0,   1 
## max values  : -84.25950623, 34.00882877,    11812,  Wet Pond, Underground Vault,        2018,      Weir Wall,       999,  69

Similar to the geocoding example above, with these spatial points, we can make API calls through R packages to acquire, for example, elevation data from the USGS Elevation Point Query Service.

#elevatr allpws access to elevation data
library(elevatr)
dbasins_subset<-dbasin_pts[1:10,] #subset for simplicity
dbasins_subset<-get_elev_point(dbasins_subset) #assign elevation points to subset
## Note: Elevation units are in &units=Meters 
## Note:. The coordinate reference system is:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
dbasins_subset #now have elevation 
## class       : SpatialPointsDataFrame 
## features    : 10 
## extent      : -84.35676, -84.34985, 33.88007, 33.88555  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 11
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated, ConstructType, DamHeight, Age, elevation, elev_units 
## min values  : -84.35675673, 33.88007369,     2801,  Dry Pond,           Surface,        1989,         Riser,         3,  12,     255.8,     meters 
## max values  : -84.34984609, 33.88555104,     2810,  Wet Pond, Underground Vault,        2007,     Weir Wall,        10,  30,    283.14,     meters

Now let’s bring in another layer. A shapefile of subwatershed polygons from USGS (U.S. Geological Survey, National Geospatial Program, 20200505, USGS National Hydrography Dataset Best Resolution (NHD) for Hydrologic Unit (HU) 8 - 03130001 (published 20200505): U.S. Geological Survey.). I downloaded this data using the National Map viewer website (https://viewer.nationalmap.gov/advanced-viewer/).

Note that the shapefile has become the standard format in which vector GIS data is usually served. Although developed originally by ESRI, it has become non-proprietary. IMPORTANT to note that a single shapefile is really a set of linked files (.shp, .dbf, .shx, .prj) that are all required, but are all called by using “filename.shp”.

# Load raster package
library(raster)
## 
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:tidyr':
## 
##     extract
# Bring in the 12 digit hydrologic units (HUC)
hucs12<- raster::shapefile("../IDEAS_Spatial_Ecology_Workshop/Hydrologic_Units_Subwatershed.shp")

What is the coordinate system of the hucs? Is it also lat-long (aka unprojected, aka geographic)? How can we find out?

The coordinate system is lat-long base don the information in the project4string. We can find this information by calling the command entered below, or viewing the object from the environment and selecting projargs dropdown to view.

#view the projargs
hucs12@proj4string@projargs
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# project4string = +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
# string represent lat-long (also included in syntax)

Now let’s overlay these layers to browse the data in R. You can zoom in or out and clip on points or polygons to see the attributes.

# Load mapview package to interactively view layers
library(mapview)

# View
mapview::mapview(dbasin_pts) + mapview::mapview(hucs12)
## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'

## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'

Let’s add the name of the 12 digit subwatershed to the attributes of the detention basin points to allow us to look at differences between subwatersheds. This is known as a spatial join.

# Get HUC 12 watershed name of each retention basin
dbn_hucs<-over(dbasin_pts,hucs12) #add 12 digit identifiers
head(dbn_hucs)
##   OBJECTID    HUC_8     HUC_10       HUC_12 ACRES NCONTRB_A HU_10_GNIS
## 1      469 03130001 0313000112 031300011203 24078         0       <NA>
## 2      469 03130001 0313000112 031300011203 24078         0       <NA>
## 3      469 03130001 0313000112 031300011203 24078         0       <NA>
## 4      469 03130001 0313000112 031300011203 24078         0       <NA>
## 5      469 03130001 0313000112 031300011203 24078         0       <NA>
## 6      469 03130001 0313000112 031300011203 24078         0       <NA>
##   HU_12_GNIS   HU_10_DS      HU_10_NAME HU_10_MOD HU_10_TYPE     HU_12_DS
## 1       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 2       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 3       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 4       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 5       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
## 6       <NA> 0313000111 Peachtree Creek        UA          S 031300011204
##    HU_12_NAME HU_12_MOD HU_12_TYPE META_ID STATES
## 1 Nancy Creek        UA          S    GA04     GA
## 2 Nancy Creek        UA          S    GA04     GA
## 3 Nancy Creek        UA          S    GA04     GA
## 4 Nancy Creek        UA          S    GA04     GA
## 5 Nancy Creek        UA          S    GA04     GA
## 6 Nancy Creek        UA          S    GA04     GA
##                                 GlobalID ShapeSTAre ShapeSTLen
## 1 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 2 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 3 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 4 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 5 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
## 6 {0B99DAC4-D3A3-42CF-BD1E-9F0E1D584501} 1047009918   187258.2
# Add HUC 12 name to detention basin SPDF
dbasin_pts@data<-cbind(dbasin_pts@data,dbn_hucs$HU_12_NAME)

# Let's look at the dbasin.pts attribute table now
dbasin_pts
## class       : SpatialPointsDataFrame 
## features    : 1261 
## extent      : -84.44169, -84.25951, 33.87834, 34.00883  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 10
## names       :            X,           Y, OBJECTID, BasinType,         BasinSurf, DateCreated,  ConstructType, DamHeight, Age,               dbn_hucs$HU_12_NAME 
## min values  : -84.44169472, 33.87833621,     2801,  Dry Pond,               N/A,        1950, Concrete Flume,         0,   1, Crooked Creek-Chattahoochee River 
## max values  : -84.25950623, 34.00882877,    11812,  Wet Pond, Underground Vault,        2018,      Weir Wall,       999,  69,                       Nancy Creek

Now, we can write out dbasin_pts as a shapefile that can be brought in to other GIS software and will retain the new information we have collected.

library(raster)
#relative to data folder
shapefile(dbasin_pts,"../IDEAS_Spatial_Ecology_Workshop/dbasin_pts.shp", overwrite=T)
## Warning in rgdal::writeOGR(x, filename, layer, driver = "ESRI Shapefile", :
## Field names abbreviated for ESRI Shapefile driver

Or, the dbasin_pts spatial points data frame can be stored within a saved R workspace or .RData file. Please save this workspace so that you can load it in the next exercise.

#drop in mapping folder
save.image("intro.RData")

We have just worked through a simple example of overlaying spatial data layers in order to transfer data from one layer to another.